Mathematical models for water environment

Author

Guangyao Zhao

Published

May 12, 2023

1 What is a model

There is no philosophical consensus on the definition of a model. For our purpose, we’ll go with this definition: a mathematical model that consists of variables and functions (Molnar, 2022).

Molnar, C., 2022. Interpretable machine learning: A guide for making black box models explainable, Second edition. ed. Christoph Molnar, Munich, Germany.

Variables represent aspects of the data and the model.
Functions relate the variables to each other.

Functions range from simple, like \(y = 5\cdot x\) (only one variable), to complex, like a deep neural network with millions of parameters (ChatGPT-3.5 includes 175 billion).

2 First principals model

Theoretical work is said to be from first principles. For instance, hydrolysis processes of Activated sludge model (Henze et al., 2015) are described as followings.

Henze, M., Gujer, W., Mino, T., van Loosedrecht, M., 2015. Activated sludge models ASM1, ASM2, ASM2d and ASM3. Water Intelligence Online 5, 9781780402369–9781780402369. https://doi.org/10.2166/9781780402369

Aerobic hydrolysis:

\[ K_{\mathrm{h}} \cdot \frac{S_{\mathrm{O}_{2}}}{K_{\mathrm{O}_{2}}+S_{\mathrm{O}_{2}}} \cdot \frac{X_{\mathrm{S}} / X_{\mathrm{H}}}{K_{\mathrm{X}}+X_{\mathrm{S}} / X_{\mathrm{H}}} \cdot X_{\mathrm{H}} \tag{1}\]

Anoxic hydrolysis:

\[ K_{\mathrm{h}} \cdot \eta_{\mathrm{NO}_{3}} \cdot \frac{K_{\mathrm{O}_{2}}}{K_{\mathrm{O}_{2}}+S_{\mathrm{O}_{2}}} \cdot \frac{S_{\mathrm{NO}_{3}}}{K_{\mathrm{NO}_{3}}+S_{\mathrm{NO}_{3}}} \cdot \frac{X_{\mathrm{S}} / X_{\mathrm{H}}}{K_{\mathrm{X}}+X_{\mathrm{S}} / X_{\mathrm{H}}} \cdot X_{\mathrm{H}} \tag{2}\]

Anerobic hydrolysis:

\[ K_{\mathrm{h}} \cdot \eta_{\mathrm{fe}} \cdot \frac{K_{\mathrm{O}_{2}}}{K_{\mathrm{O}_{2}}+S_{\mathrm{O}_{2}}} \cdot \frac{S_{\mathrm{NO}_{3}}}{K_{\mathrm{NO}_{3}}+S_{\mathrm{NO}_{3}}} \cdot \frac{X_{\mathrm{S}} / X_{\mathrm{H}}}{K_{\mathrm{X}}+X_{\mathrm{S}} / X_{\mathrm{H}}} \cdot X_{\mathrm{H}} \tag{3}\]

3 Data-driven model

Machine learning (ML) is a subfield of Artificial Intelligence (AI) that deals with systems that are able to acquire their own “knowledge” by extracting patterns from data.

Linear regression (LR)
Tree-based models: Decision tree (DT), random forest (RF), gradient boosting DT (GBDT), extreme boosting DT (XGBoost), LightGBM (LGBM)
Support vector regression (SVR)
Artificial neural network (ANN)
- Recursive neural network (RNN)
- Long short-term memory (LSTM)
- Convolution neural network (CNN)

4 Hybrid model

Hybrid model is a framework that incorporates both (incomplete) knowledge and data.

Dynamics that are not modeled explicitly by the first-principles component are captured by the machine learning component, thereby filling in knowledge gaps (Quaghebeur et al., 2021).

Quaghebeur, W., Nopens, I., De Baets, B., 2021. Incorporating unmodeled dynamics into first-principles models through machine learning. IEEE Access 9, 22014–22022. https://doi.org/10.1109/ACCESS.2021.3055353

First-principle:

\[ \frac{\mathrm{d}^{k} \mathbf{X}(t)}{\mathrm{d} t^{k}} = f\left(\mathbf{X}(t);\mathbf{p}\right), \tag{4}\]

Data-driven:

\[ \frac{\mathrm{d}^{k} \mathbf{X}(t)}{\mathrm{d} t^{k}} = n\left(\mathbf{X}(t), \mathbf{Y}(t);\mathbf{w}\right), \tag{5}\]

Hybrid:

\[ \frac{\mathrm{d}^{k} \mathbf{X}(t)}{\mathrm{d} t^{k}} = f\left(\mathbf{X}(t);\mathbf{p}\right) + n\left(\mathbf{X}(t), \mathbf{Y}(t);\mathbf{w}\right). \tag{6}\]

5 Model implementations

Programming language

Python
- Package management: Anaconda
- Data manipulation: Pandas, Panel datas
- Scientific computing : Numpy, numerical python
- Visualization: Matplotlib, Matlab-style plotting library
- Third-party package for machine learning: Scikit-learn, Pytorch
R, Julia, Matlab

Example

We can implement Eq. 1 in Python as followings:

self.XS_Kh * substrate[0] / (self.XS_KO + substrate[0])
           * (substrate[10] / substrate[11])
           / (self.XS_KXS + (substrate[10] / substrate[11]))
           * substrate[11]

6 Application programming interface (API)

Data cleaning by Pandas

dataset = (
    pd.read_csv(tmp_file_path, header=None, index_col=None).iloc[:, 0]
    .str.split(";", expand=True))

Random forest implemented by Sklearn

from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression

X, y = make_regression(n_features=4, n_informative=2,
                       random_state=0, shuffle=False)
regr = RandomForestRegressor(max_depth=2, random_state=0).fit(X, y)

print(regr.predict([[0, 0, 0, 0]]))

7 Softwares

Editor:
- VSCode (recommend)
- Sublime text
Reference management:
- Zotero (recommend)
- Mendely
- Endnote
Markdown:
- Obsidian (recommend)
- Notion (recommend)
- Typora
Typesetting:
- Quarto (recommend)
- RMarkdown
- LaTeX